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Abstract —In this letter, we present a widely-linear minimum 
mean square error (WL-MMSE) precoding scheme employing 
real-valued transmit symbols for downlink large-scale multi-user 
multiple-input single-output (MU-MISO) systems. In contrast to 
the existing WL-MMSE transceivers for single-user multiple- 
input multiple-output (SU-MIMO) systems, which use both WL 
precoders and WL detectors, the proposed scheme uses WL 
precoding only and simple conventional detection at the user 
terminals (UTs). Moreover, to avoid the computational complexity 
associated with inversion of large matrices, we modify the WL- 
MMSE precoder using polynomial expansion (PE). Our simula¬ 
tion results show that in overloaded systems, where the number 
of UTs is larger than the number of base station antennas, the 
proposed PE WL-MMSE precoder with only a few terms in the 
matrix polynomial achieves a substantially higher sum rate than 
systems employing conventional MMSE precoding. Hence, more 
UTs sharing the same time/frequency resources can be served 
in a cell. We validate our simulation results with an analytical 
expression for the asymptotic sum rate which is obtained by using 
results from random matrix theory. 

Index Terms —Large-scale MU-MISO systems, precoding, 
widely-linear filtering, polynomial expansion," large system anal¬ 
ysis. 

I. Introduction 

M ULTIPLE-input multiple-output (MIMO) technology 
enables a substantial increase in spectral efficiency and 
transmission reliability in wireless communication systems. 
An emerging research field in MIMO communications are 
so-called large-scale MIMO systems, where base stations are 
equipped with a large number of antennas, e.g., hundred or 
more. Large-scale MIMO systems enable very high spectral 
and power efficiencies |T). 

In this letter, we consider the downlink (DL) of a large-scale 
multi-user multiple-input single-output (MU-MISO) system, 
which embodies a Gaussian broadcast channel (GBC). It 
is known that for the GBC, nonlinear dirty paper coding 
(DPC) is capacity achieving ta. However, due to the high 
computational complexity of DPC, linear precoding schemes 
such as minimum mean square error (MMSE) precoding are 
attractive alternatives. Moreover, in the asymptotic scenario, 
where the number of base station antennas, N, and the 
number of user terminals (UTs), K, are very large but K is 
significantly smaller than N, the linear MMSE precoder with 
complex Gaussian transmit symbols achieves near optimum 
performance in terms of the sum rate JTj. On the other hand, 
in DL MU-MISO systems, if the number of UTs, K, is very 
large, and K is much larger than the number of base station 
antennas, N, so-called semi-orthogonal user selection zero¬ 
forcing (SUS-ZF) precoding achieves the same asymptotic 
sum rate as DPC (3). 

For code division multiple access (CDMA) systems and 
single-user MIMO (SU-MIMO) systems with improper trans¬ 
mit symbols, i.e., transmit symbols with non-zero pseudo¬ 
covariance, it has been shown that so-called widely- 
linear MMSE (WL-MMSE) detectors outperform conventional 
MMSE detectors (4). In a WL-MMSE detector, both the re¬ 
ceived signal and its complex conjugate are filtered separately 
and independently, and the filter outputs are combined a, el 
Recently, the authors of introduced a joint optimization 
approach for designing WL precoders and detectors for SU- 
MIMO systems. However, in MU-MISO systems, due to their 
decentralized structure, the application of WL detectors such 
as those proposed in |6] is not possible. In this letter, we 


propose a WL precoding scheme for MU-MISO systems, 
which uses real-valued data symbols and does not require any 
signal processing at the UTs. The goal of the optimization 
is the minimization of the sum mean square error (sum 
MSE) between the real part of the received symbols and 
the real-valued data symbols under a sum transmit power 
constraint. However, the obtained solution still entails a high 
computational complexity due to the required inversion of 
a large matrix. To overcome this problem, we exploit the 
large system properties of large-scale MU-MISO systems and 
extend the results of 13 to approximate the matrix inversion 
in the WL-MMSE precoder by a matrix polynomial. Finally, 
using results from random matrix theory, we obtain analyti¬ 
cal expressions for the asymptotic signal-to-interference-plus- 
noise ratio (SINR) and the asymptotic sum rate. 

The contributions of this letter are summarized as follows. 
First, we develop a WL-MMSE precoder for real-valued 
transmit symbols and show that it yields a substantially higher 
sum rate than the commonly used MMSE precoder, when the 
number of UTs is larger than the number of base station 
antennas. This is different from the work in SI, where a 
framework for calculation of strictly linear MMSE downlink 
transceiver filters from uplink filters was introduced. Second, 
in contrast to the existing WL-MMSE transceivers, where 
signal processing is performed both at the transmitter and 
the receiver, in our proposed scheme, signal processing at 
the receiver is not required. This makes the proposed scheme 
attractive for decentralized applications, i.e., MU-MISO sys¬ 
tems. Third, using results from random matrix theory, we also 
propose a polynomial expansion (PE) WL-MMSE precoder, 
which is based on a matrix polynomial instead of matrix 
inversion and further reduces the computational complexity. 
Fourth, our numerical results show that the proposed PE WL- 
MMSE precoder achieves a sum rate which is very close to the 
sum rate of the SUS-ZF precoder proposed in y] but entails 
a lower computational complexity. We consider SUS-ZF as a 
performance benchmark because of its excellent performance, 
when the number of UTs is larger than the number of base 
station antennas. 

Notation: Boldface lower and upper case letters represent 
column vectors and matrices, respectively, diag (Qi, ..., Qk) 
is a diagonal matrix with scalars Qi, ■ ■ ■ ,Qk on its main 
diagonal. I k denotes the K x K identity matrix and [A] ., 

[A], n , and [A] stand for the mth row, the nth column, 
and the element in the mth row and the nth column of 
matrix A, respectively. (•)* denotes the complex conjugate and 
tr(-), (-) T , and (-) H are the trace, transpose, and Hermitian 
transpose of a matrix, respectively. stands for the real 

part of a complex variable and ||a|| represents the Euclidean 
norm of vector a. E {■} refers to the expectation operator and 
CAf(m,&) denotes a circular symmetric complex Gaussian 
distribution with mean vector m and covariance matrix <f>. 

II. System Model 

We consider the downlink of a single-cell large-scale MU- 
MISO system, where a base station with N antennas transmits 
signals to K single-antenna UTs which are randomly and 
uniformly distributed within the cell. Each UT occupies the 
same time and frequency resources. N and K are assumed to 
be large with their ratio f} = K/N being constant. We consider 
a flat fading channel, and we further assume that the channel 
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Fig. 1. Downlink augmented real-valued system model. 
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Fig. 2. Dual uplink augmented real-valued system model. 


state information (CSI) is perfectly known at the transmitter. 
The real-valued, independent and identically distributed (i.i.d.) 
zero-mean unit-variance Gaussian data symbols of the K 
UTs are stacked into vector d = [d\ ... cIk} T £ R A with 
E{dd T } =I k- □ The vector of the stacked detected symbols 
of all UTs is given by 

d = 5R jp~ 1/2 AHVP 1/2 d + P~ 1/2 Anj , (1) 

where channel matrix H models i.i.d. Rayleigh fading with 
[H] m n ~ CM (0,1). V £ <C NxK is the normalized precoding 
matrix with unit norm columns. A = diag (<5i,..., 8k) 
contains real-valued scaling factors for all UTs and P = 
diag (Pi,..., Pk) is the power allocation matrix with Pj 
being the ith UT’s transmit power, n = [m ... uk ] T ~ 
CM (0, is an additive white Gaussian noise (AWGN) 

vector whose entries have variance <r 2 . Using augmented real¬ 
valued vectors and matrices, d can be equivalently expressed 

aS d = P~ 1/2 AHVP 1/2 d + p- 1/2 An R , 

T 


( 2 ) 


where H = [Hr - Hi] and V = |V,[ Vj 1 J '. Here, 
Hr/Hj and Vr/Vi are the real/imaginary parts of H and 
V, respectively. Furthermore, iir = [n Rl ... hr ,,] 7 is the real 

S irt of the noise vector n with variance <t 2 r = 0.5u 2 . In Fig. 

the block diagram of the downlink augmented real-valued 
system model is shown. The design goal in this work is the 
optimization of V for the minimization of the sum MSE. The 
corresponding optimization problem can be formulated as 

' 2 ' 1 (3) 


subject to: tr 


(pv h v) 


minE 
v 


{l|d-d|| 2 } 


= Ptx, Pk > 0,Vfc e /v}, 


where Ptx denotes the joint transmit power budget of all UTs. 
Because of the coupling of the different UTs introduced by 
the preceding matrix, the constrained downlink optimization 
problem in (|3]i is difficult to solve. In contrast, in the uplink, 
each vector of the detection matrix can be optimized separately 
and the corresponding optimization problem is easier to solve. 
In the next section, we exploit results from uplink/downlink 
duality to transform the original downlink system into its 
equivalent uplink counterpart and solve the much simpler 
optimization problem in the uplink [[8], 

III. Widely-Linear Precoding 
In this section, we use the uplink/downlink duality to derive 
the WL precoders. First, we derive the o ptima l WL-MMSE 
precoder in Section UlI-AI Then, in Section IIH-B I we introduce 
the low-complexity PE WL-MMSE precoder. 


'Throughout this letter, we assume real-valued Gaussian data symbols for 
WL-MMSE precoding, whereas for conventional ZF and MMSE precoding, 
which are considered as benchmark schemes, complex-valued Gaussian data 
symbols are assumed as usual. 


A. Optimal WL-MMSE Precoding 

One of the main results of the uplink/downlink duality 
of sum MSE minimization in MU-MISO systems states that 
under the same sum power constraint, in the downlink system, 
the same sum MSE can be achieved as in the equivalent 
dual uplink system, if the power allocation, precoding, and 
detection matrices are chosen appropriately [8j. For complex¬ 
valued system models, the dual uplink system model is ob¬ 
tained by adopting the Hermitian transposes of the downlink 
channel and precoding matrices as the uplink channel and 
detection matrices, respectively. Here, we extend this concept 
to augmented real-valued system models, cf. Fig. [2] In the 
dual uplink system model, depicted in Fig. [2] the sum MSE 
minimization problem can be formulated as 

minE|||d-d|| 2 | (4) 

subject to: tr(Q) =P T x, Qk > 0,Vfc £ {1,..., K}, 


where d can be expressed as 

d = Q^ 1 / 2 AUH T Q 1 / 2 d + Q^ 1 / 2 AUn R . (5) 


Here, Q is the power allocation matrix and U is the nor¬ 
malized version of the detection matrix U = ALT with unit 
norm rows, where diagonal matrix A contains the norms of 
the rows of U. We define the signal-to-noise ratio (SNR) 
as SNR = Ptx/c 2 - To focus on the precoder design, 
we assume that all UTs transmit with equal powers, i.e., 
Q = (Ptx/K)Ik- This makes the problem in i(4j convex 
with U = A 1 U as the optimum solution, where U is given 
by i/i „2 ~ x -l 
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( 6 ) 


Now, exploiting the uplink/downlink duality, the normalized 
precoding matrix is obtained as HI 

V = U T . (7) 


We further use the uplink/downlink duality to obtain the dual 
counterpart of the power allocation in the downlink as P = 
diag(p), where vector p is given by J9] Eq. (10.44)], 

P= ^(Iic-B T ) _1 b. (8) 


Here, matrix B is defined as [BL „ = [H] [V] 

7 l Jm,n _j_ L J n,: L J:, 1 

the elements of vector b = \l>i ... are given by |9l 

b k = 


SINR]^ l 


(i + sinrD [H], [V] 


, and 


(9) 


SINRj: :L is the SINR at the fcth UT in the uplink and is 


defined as 

sinr]^- 


Qh 


[U] fe ,:[H T L 


0.5^||[u]J| +Ef- Qi [U] fe , : [H T ] : 


( 10 ) 

Finally, the normalized WL precoding matrix V = Vr + 
jVi can be constructed from its augmented version V = 
[ V R V A ] T 

B. PE WL-MMSE Precoding 

In the following, we derive the PE WL-MMSE precoder. To 
this end, we approximate the matrix inversion in the detector 
matrix U by a matrix polynomial and rewrite (|6]i as 


;=o 


1 

AT 


H t H 


( 11 ) 
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Adopting the minimization of the average energy of the 
difference between the WL-MMSE detector’s output and the 
PE WL-MMSE detector’s output as the optimization objective, 
the optimal coefficients u) = [lu 0 ...ujl] T are calculated as 
uj = E ^ 1 • ip, where the elements of matrix E are given by 

01 K 

[S] m ,„ = f (m+n, + 7^jvf |m+ ^ ,) . (12) 

and the elements of vector cp are defined as [<p\ m = 

Here, is the 77 ?th order moment of the eigenvalues of the 
large matrix -^H T H and given by ifTTl Theorem 1 ], 


£ 


(m) 


K,N- 


- > 

>oo 


E 

i =0 


m 

i + l) 


\ 0 W 


m 


(13) 


The optimal coefficient vector u> can be calculated easily 
and does not depend on the instantaneous realizations of H. 
The PE WL-MMSE precoder is then obtained by replacing U 
with Upe in Section IIII-AI Exploiting the structure of (fill 
and applying Horner’s scheme, the PE WL-MMSE precoded 
data vectors can be calculated by performing matrix-vector 
multiplications only while avoiding matrix-matrix multiplica¬ 
tions, see 0, CD for details. This leads to a computational 
complexity of O ( KN) for calculation of one precoded data 
vector. 

IV. Large System Analysis 


In this section, we use results from random matrix theory 
to derive asymptotic expressions for the UTs’ SINRs and the 
sum rate in the downlink of a large-scale MU-MISO system. 
Since the SINRs in the downlink and the dual uplink system 
are identical | 8 ] and due to the fact that large system analysis 
of the detector in the uplink is simpler than analysis of the 
downlink precoder, we derive the asymptotic SINRs for the 
dual uplink model. First, we analyze the SINRs in the uplink 
for conventional MMSE detection. The corresponding detected 
signal of the fcth UT can be expressed as 

JMMSE = (jjHjj + E l N ) - 1 [H]£ : d k 

+ X>] fe , : (H h H + [H]^. dj 

+ -E [H] fc ■ (H h H + ^l N ) _ 1 n, (14) 

Vp p 

where p = Ifx / K ■ Now, we define the following variables 


£k — lim — 
S K,N-yoo N 


H 


fc,: 


N 


HfcHfc + 7 Ijv ) [h; 


■jfc,: 


(15) 


4>k = lim — [H] fc . 
K,N—too N ’■ 


-H E H fc+7 I 


N 


N 


(Hi 


(16) 


*= b |H1 - + ‘ 

X (^HfcH fc + 7 l w ) _1 [H]£., (17) 

where 7 = <r 2 / (pN) and is identical to matrix H with the 
fcth row removed. Here, 7 ipk, and (k are the asymptotic 

values of the magnitude of the useful signal, noise power, and 
interference power of the fcth UT for K,N —>■ 00 , respectively. 
Using the above defined variables, the asymptotic value of the 
SINR of the fcth UT for K, N —> 00 can Jpe expressed as 

S inrM mse ° 7) = Y (18) 

C k + rvk 


Now, exploiting Ifl2l Corollary 1] yields 

£k = lim — 

S K,N->00 N 


(^H E H fc + 7 Fv) 


-1 


= tr 




■ 7 l N 


dF A (s) 


[Hi 


= H A (p, - 7 ) 


(i -PY (1 + P) 1 

47 s 27 4 


w 

27 


1 

2 ’ 


(19) 


where dF A (s) is the empirical distribution of the eigenvalues 
of = TAT h with A and T being the matrix of 

eigenvalues and the matrix of eigenvectors, respectively. Here, 
the Stieltjes transform of dF A (s) is denoted by H A (/3, A) = 
(s — X)~ 1 dF A (s). Using (12) Corollary 1], applying 
the above mentioned eigen-decomposition, and considering 
TT h = I N , the following expression is obtained for x/>k 


1 


Bit = 


( 0 " H * + *») "0 tr (( A+ *) ~ 2 ) 
a.s. , r°° dF A (a) 

K,N^J J ( s + 7 ) 2 ' 


K,N-> 00 

dH A U 3,-7) 

t)7 


( 20 ) 


Using a similar procedure and performing algebraic opera¬ 
tions, £k can be expressed as II 121 , fl3i l 


Ck = lim — [HI 
S K,N^oo N 1 J 


N 


HfcHfe + 7 I 


N 


N 




■ 7 I 1 V 


N 


HiHfeX 


[H 


k,: 


K,N^-oo 


tr (a (A + 7 I. 1 v) 2 ) 


sdF A (s) 
’— 00 ( s 7 ) 


dF A (s) 
s + 7 


^ 7 


dF A (a) 
-00 (s + 7 ) 


= H a (/3, - 7 ) + (P, - 7 ) • 


( 21 ) 


Substituting (IT9l)- (j2Tb into (fT8|) yields the asymptotic SINR 
of the fcth tlT in the uplink! Since this SINR is identical to 
the SINR of the fcth UT in the dual downlink system, the 
asymptotic sum rate in the downlink with MMSE precoding 
and complex-valued Gaussian data symbols is given by 


R 


MMSE 


K 

E 

k =1 


log 2 (l + SINR 5 f MSE °(/I, 7 )). (22) 


Now, we are ready to provide the uplink SINR for WL-MMSE 
detection. 

Theorem 1 : The asymptotic SINR of the fcth UT in the uplink 
of a MU-MISO system with K, N —> 00 using real-valued 
transmit symbols and WL-MMSE detection is given by 

SINR^ l “ mmse ° (/?, 7 ) = SINRf MSE ° (p/2, 7 / 2 ). 


Proof: See Appendix. ■ 

Using the above theorem and the uplink/downlink duality, 
the sum rate of the downlink system using real-valued Gaus¬ 
sian data symbols and a WL-MMSE precoder can be expressed 
as K 

^WL—MMSE = 0.5 Y, !og 2 (l + SINRf MSE ° (p/2, 7 / 2 )) . 

fc=i 

V. Numerical Results 

In order to evaluate the performance of the proposed WL 
precoder, Monte-Carlo simulations have been conducted. The 
noise variance is assumed to be cr^ = 1. In Fig. [3] the 
ergodic sum rates of the MMSE, ZF, conjugate beamforming 
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(BF), WL-MMSE, SUS-ZF 0, and WL-ZF precoders for 
SNR = 20 dB and N = 100 base station antennas are 
depicted. Conjugate BF is the downlink counterpart of the 
matched filter in the uplink. The ergodic sum rate is given 
by R = Yhk=i -^ 0 °g 2 (1 + SINRfc)}, where the expectation 
is approximated by averaging over a sufficient number of 
channel realizations. The WL-ZF precoding matrix is obtained 
by setting the detector matrix in the dual uplink model to 
U = i H11 1 i 'll and using the procedure described in 
Section Jill to obtain the precoder. 

As can be seen in Fig. 0 with increasing K/N, the 
difference in performance between conjugate BF and the other 
schemes increases until the load factor reaches K/N = 0.7, 
where the MMSE precoder achieves the highest sum rate 
performance among the considered schemes. For K/N < 1, 
the MMSE precoder outperforms the WL-MMSE precoder. 
This is due to the fact that for K < N, the base station 
has enough spatial degrees of freedom to efficiently suppress 
interference from K — 1 users if complex transmit symbols 
and MMSE precoding are employed. On the other hand, for 
K < N, the sum rate of the WL-MMSE precoder is com¬ 
promised by the waste of dimensions caused by the limitation 
to real-valued transmit symbols. For K > N, the WL-MMSE 
precoder achieves a significantly higher sum rate compared 
to the conventional MMSE precoder. This occurs because the 
WL-MMSE precoder employs real-valued transmit symbols, 
which enables it to relegate the interference to the imaginary 
part of the received signal, making it invisible to the receiver 
that inspects only the real part of the observation. In addition, 
in contrast to the WL-ZF precoder’s sum rate, which decreases 
significantly for K/N > 1.5, the sum rate of the WL-MMSE 
precoder is almost constant for 1.5 < K/N < 1.9. In fact, the 
proposed WL-MMSE precoder closely approaches the sum 
rate of the SUS-ZF precoder 0 - Moreover, in contrast to 
the SUS-ZF precoder, where UTs with poor channels are 
allocated zero rate, with the proposed WL-MMSE precoder, 
always all UTs are served. Furthermore, in Fig. 0 we also 
present analytical results for the sum rate obtained from the 
large system analysis for the conventional MMSE and WL- 
MMSE precoders. A perfect match between analytical results 
and simulation results is observed. 

In Fig. 01 the sum rates of PE WL-MMSE precoders with 
different polynomial orders L are compared to the sum rate 
of the BF and WL-MMSE precoders for SNR = 15 dB and 
N = 50. As can be observed, for increasing L, the PE WL- 
MMSE precoder approaches the sum rate of the WL-MMSE 
precoder. For example, for L = 4 and K/N = 1.5, the 
PE WL-MMSE precoder achieves almost 91% of the WL- 
MMSE precoder’s sum rate and thereby also approaches the 
sum rate of the SUS-ZF precoder. However, for K > N, 
the computational complexity of calculating one precoding 
vector for PE WL-MMSE precoding and SUS-ZF precoding is 
O ( KN ) and O (KN 2 ) 0, respectively, i.e., PE WL-MMSE 
entails a lower complexity. 

Appendix - Proof of Theorem 1 

The detected signal in the uplink of a MU-MISO system us¬ 
ing WL-MMSE detection is given by (fl4l ). but with H, n, and 
a/ being replaced by H, iik , and 0.5 a/, respectively. Thus, 
for the WL-MMSE detector, a similar SINR expression as for 
the conventional MMSE detector results. Furthermore, both 
matrices H and H have zero-mean i.i.d. Gaussian distributed 
entries, but the dimension of H is K x 2TV whereas that of H 
is K x N. Therefore, the Stieltjes transform of dF i IIHII (s) 
is obtained by replacing (3 in the Stieltjes transform of 
(s) by K/ (2 N) = j3/2. Moreover, the SINRs in 
the uplink system using WL-MMSE and MMSE detection 
are only functions of the Stieltjes transform of ( s ) 

and c/f b_ H n H (s), and their derivative with respect to 7 , 



Fig. 3. Sum rate vs. K/N for SNR = 20 dB, N = 100. 



Fig. 4. Sum rate vs. K/N for SNR = 15 dB, N — 50. 

respectively. In addition, we have a 2 = 0.5 a 2 . Hence, the 
SINR in the uplink system using WL-MMSE detection is 
obtained by replacing (3 with /3/2 and 7 with 7/2 in the SINR 
expression of the uplink system using MMSE detection. 
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